Improving the efficiency of extended ensemble simulations: 
The accelerated weight histogram method 
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We propose a method for efficient simulations in extended ensembles, useful, e.g., for the study 
of problems with complex energy landscapes and for free energy calculations. The main difficulty 
in such simulations is the estimation of the a priori unknown weight parameters needed to produce 
flat histograms. The method combines several complementary techniques, namely, a Gibbs sampler 
for the parameter moves, a reweighting procedure to optimize data use, and a Bayesian update 
allowing for systematic refinement of the free energy estimate. In a certain limit the scheme reduces 
to the l/t algorithm of B.E. Belardinelli and V.D. Pereyra [Phys. Rev. E 75, 046701 (2007)]. The 
performance of the method is studied on the two-dimensional Ising model, where comparison with 
the exact free energy is possible, and on an Ising spin glass. 

PACS numbers: 05.10.-a, 02.70.-c, 05.10.Ln 
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I. INTRODUCTION 

The complex behavior of models with rough energy 
landscapes (such as spin glasses, biopolymers, etc.) is an 
important but challenging problem. In many situations 
progress is possible only using computer simulations, but 
this too is a notoriously difficult problem. In order to effi- 
ciently sample the equilibrium distribution of such mod- 
els it is necessary to overcome the barriers separating dif- 
ferent metastable minima, a process which can be very 
slow if the temperature is low. A particularly fruitful 
strategy to enhance the sampling is to enlarge the con- 
figuration space to include some well-chosen parameter(s) 
in the model. In simulated tempering [l|, e.g., the tem- 
perature is promoted to a dynamical variable, whereby 
the system heats up and cools down randomly and gets 
a good chance to explore the energy landscape. Such ex- 
tended ensemble or generalized ensemble methods have 
gained much attention recently and are routinely used 
in simulations of such diverse problems as spin glasses, 
biomolecules, and problems in statistics. The methods 
are also highly useful for free energy calculations and for 
the estimation of the probability of extreme events. An 
attractive feature is that they can easily be incorporated 
into existing simulation methods. The downside is, how- 
ever, that in order to work properly they require fine 
tuning certain a priori unknown weights. The weights 
must be tuned to ensure that each parameter value (e.g. 
temperature) of the extended ensemble is visited equally 
often on average. They are simply related to the free 
energy at the given parameter value, and are therefore 
a highly useful byproduct of the simulation, if they can 
be estimated efficiently using some scheme. While sev- 
eral such schemes have been constructed [l|, [|[ there is a 
strong need for improvements. In this paper we propose 
one such scheme with a number of distinct advantages. 



Section Q] gives a brief background on extended en- 
sembles and discusses some shortcomings of existing 
methods. Section [TTJ introduces an improved method, 
the accelerated weight histogram method. In Sec. IIIII 
the method is tested and benchmarkcd on two model 
problems, the two-dimensional Ising model and a three- 
dimensional Ising spin glass. 



A. Extended ensembles 

We consider a model described by a probability distri- 
bution tt\ (x) , which depends on one or more parameters 
A. Typically we want to study the model for a whole 
range of parameter values. In an extended ensemble sim- 
ulation, states are sampled according to a joint distribu- 
tion P(x, A), which we express, without loss of generality, 
as 

P(x,m) = 4e /m_Em(x) , (1) 



where x € X denotes the configuration of the system and 
we assume a discrete set of preselected parameter values 
X m € M. = {Xi, X2, ■ ■ ■ , Xm}- The weights e* m intro- 
duced in Eq. (|T|) allow tuning the marginal distribution 
P(m) to approach any desired form. We assume that we 
have a way of generating samples from the conditional 
distribution 



P(x\m) = 7r m (x) = e i 



-E m [x) 



(2) 
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at fixed parameter A m , using, e.g., Markov chain Monte 
Carlo (MC) or molecular dynamics (MD) methods. Gen- 
erally this can be done without knowledge of the normal- 
ization constants e~ Fm . In physics applications Eq. ^ is 
often the ordinary canonical distribution ~ e~ £ / T , where 
we absorbed the temperature into the energy in order to 
treat it on equal footing as any other parameter of the 
system. Likewise, F m denotes the dimensionless free en- 
ergy O- ln Bayesian statistics problems Eq. (J2) is typi- 
cally the a posteriori distribution for the model parame- 
ters and possibly missing data given a set of observations. 
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The ordinary (MC or MD) moves are then comple- 
mented with transitions in parameter space, which in 
most cases consist of a nearest neighbor random walk. 
The weights e^ m need to be adjusted to make the 
marginal distribution 

P(m) = J2 p M = je^- F - (3) 

X 

of m approximately flat Q • This requires f m ~ F m where 
F m is the exact (dimensionless) free energy at A m , un- 
known at the beginning of the simulation. 

Quite generally existing methods to estimate the 
weights e fm can be divided into two different classes, it- 
erative and dynamic. In an iterative method the weights 
are produced in a sequence of preliminary runs, each run 
giving a better estimate than the old, until sufficient ac- 
curacy is reached. On the other hand, in a dynamic 
scheme the weights are being continuously updated dur- 
ing a long simulation. The dynamic schemes have po- 
tential for faster convergence, but since the weights are 
constantly changing, detailed balance is violated and the 
samples collected cannot therefore be safely used to esti- 
mate average values of interest. In the iterative scheme 
the weights are fixed during each run, and only updated 
between the runs. 



B. The Wang-Landau and 1/t methods 

One particularly elegant dynamic scheme is the Wang- 
Landau method [2[, originally developed for simulations 
in the closely related multicanonical ensemble [f| . In this 
ensemble the state space is not extended, but instead one 
replaces the Boltzmann weights of the ordinary canoni- 
cal ensemble with a different one aimed at producing a 
flat histogram of some quantity A (a;), usually the energy. 
From an algorithmic point of view the main difference is 
that the elementary moves x — > x' also change the value 
of A(x), e.g., the energy, whereas in the extended ensem- 
ble method they can be performed at constant A. The 
latter allows for more flexibility when choosing the pa- 
rameter moves, something we exploit below. The Wang- 
Landau method is straightforward to adapt to extended 
ensemble simulations (as demonstrated in Ref. 0) . Each 
time the system visits a particular parameter m, the cor- 
responding free energy parameter f m is decreased by a 
certain amount, f rn -s— f rn — Sf. A histogram of vis- 
ited parameter values is collected and Sf is reduced by a 
factor, Sf <— Sf /2, when the histogram meets a certain 
flatness criteria. Then the histogram is reset and the pro- 
cess starts over with the reduced modification constant. 
The scheme where Sf is halved each iteration anticipates 
an exponential convergence of f m to its true value F m . 
Unfortunately, this is not the case. Instead, the error 
saturates at a level where reasonably flat histograms are 
produced, but the free energy estimate no longer im- 
proves since Sf becomes too small [7, 8]. It has been 



realized Q that the modification factor should rather be 
decreased at a slow steady rate Sf ~ 1/t, where t is the 
Monte Carlo time, without regard to any histograms, at 
least during the later stages of a simulation. The result- 
ing 1/t method turns out to perform very well, both in 
multicanonical and extended ensemble simulations. 



C. Open issues 

Nevertheless, there is still plenty of room for improve- 
ments. What is, for example, the most efficient way to 
move around in parameter space? How can the data col- 
lected during the simulations be used most effectively to 
produce an estimate of the free energy needed for uniform 
sampling? How should the estimates from different itera- 
tions, perhaps run in parallel, be combined in an optimal 
way? How should the set of parameters M. be chosen? 
Most often the parameter moves form a nearest neighbor 
random walk, and then the choice of the spacing between 
adjacent values may be a critical issue. Having too large 
gaps between adjacent values may lead to small accep- 
tance rates and therefore very slow dynamics along the 
parameter axis. Having too densely spaced parameter 
values, on the other hand, can make the dynamics of the 
random walk itself a limiting factor, again slowing down 
the dynamics. 



II. THE ACCELERATED WEIGHT 
HISTOGRAM METHOD 

In this paper we propose an iterative scheme — the 
accelerated weight histogram method (AWH) — which 
combines several different complementary techniques to 
give a very efficient method which addresses the issues 
mentioned above. First of all, we allow large param- 
eter steps by the use of a Gibbs sampler (a.k.a. heat 
bath algorithm). This is combined with a reweighting 
procedure which makes optimal use of the information 
collected during the moves. Together these make it pos- 
sible to choose a rather densely spaced set of parameters, 
without being limited by slow diffusion. The free en- 
ergy parameters are updated based on a histogram of 
weights (rather than a histogram of visited parameter 
values) combined with the information collected during 
previous iterations. 

The parameter moves are carried out as follows. In 
the simplest case we allow transitions m — > m! to any 
new parameter value A m ' , with a probability given simply 
by the conditional probability of m! given the current 
configuration x 

e -E m ,{x)+f m , 

w m , m {x) = P{m'\x) = - e _ Ek(x)+fk - (4) 
The transition probabilities just calculated are accumu- 
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lated in a histogram of weights 



W k <- W k +w km (x), Vfc. 



(5) 



Further, they can be used for on-the-fly reweighting of 
sampled observables 



J2 t A(xt,k)w kmt (xt) 



(6) 



where {xt,rrit} denote the time series of visited configu- 
rations. The averages (A) k at a particular value Afc thus 
get contributions from a whole range of parameter val- 
ues. Note that the validity of Eq. ([BJ does not depend on 
the f m being converged. This reweighting scheme is akin 
to the optimal multihistogram reweighting technique of 
Ferrenberg and Swendsen Q (but with no need to solve 
a nonlinear equation system). 

The update procedure continues in an iterative way. 
During each iteration a certain number, say Nj, of sam- 
ples are collected and then the free energy parameters 
are updated as f k <— f k + Af k ,Vk, with 



(7) 



where N is the total number of samples collected so 
far. The weight histogram is then updated to reflect this 
change 



W k <- W k e Afk = N/M, 



(8) 



i.e., the total weight collected is distributed evenly among 
the M parameter values, and the next iteration starts. 
Note that the identity N — J2 k W k holds before and 
after the update. The histogram is thus not reset to zero 
after the iteration but continues to grow. This makes the 
updates Eq. become smaller and smaller and allows 
for finer and finer details of the free energy to be resolved. 

Equations (|4]) to (|8]) form the core of the algorithm, 
which can be summarized as follows: 

1. Perform N x updates of the configurations x at fixed 
parameter value A m . 

2. Perform a parameter move m — > m' using the 
Gibbs sampler, Eq. (j4}. 

3. Update the weight histogram using Eq. ([5} and 
sample any observables of interest using Eq. ([6]) . 

4. Repeat steps [T]|3] until Nx samples have been ob- 
tained. 

5. Update the free energy parameters f m using Eq. (0 
and the weight histogram using Eq. (0 . 

6. Start a new iteration from step 1 unless the desired 
accuracy has been reached. 



One possible concern is that step [2] of the algorithm re- 
quires the computation of M — \A4\ different quanti- 
ties, which can become time consuming if the set A4 is 
large (as can easily happen in the case of two- or higher- 
dimensional parameter spaces). In practice, w m i m (x) will 
be exponentially small except for a range of m! close to 
m. If this is the case one may limit the search for the 
new state to a neighborhood A C M of m by replacing 
step [2] with 

2' Choose a subset of parameter values A with prob- 
ability P(A\m). Perform a parameter move m — > 
m! G A using the Gibbs sampler [Eq. fl2J), but with 
the sum restricted to A]. 

Detailed balance is maintained if P(A\m) = P(A\m') for 
all m, m! € A. A simple choice (in the one-dimensional 
case) is to select a range of parameter values as an in- 
terval A = {m — L,...,m — L + R}nM, where L is a 
random uniformly distributed integer in [0, R] and R is 
a predetermined range. The generalization to higher- 
dimensional parameter spaces is straightforward. 



A. Bootstrapping the simulation 



Clearly the update Eq. (0 requires an initial guess for 
f k and a positive value of W k — W pi i x at the start of the 
simulation. This latter value can be seen as a Bayesian 
prior of our initial guess of f k , which is later on updated 
as new data becomes available. If we have reason to 
believe that the starting estimate of the free energy is 
good (e.g., because the free energy is expected to have 
small variations), we can use a large Wprior- I n many 
applications, however, our initial guess is going to be poor 
and we need some kind of bootstrap to get an acceptable 
prior. We propose the following heuristic scheme: Carry 
out the same steps in the simulation as above, but in 
addition check, after each iteration has completed (after 
step [5]), whether all parameter values have been visited a 
certain fixed (usually small ~ 1-10) number of times. If 
not, reset the number of samples N <— M', where M' is 
the number of parameters visited so far, and let W k 
N/M = M'/M. In this way the weight histogram does 
not start to accumulate data until M' = M whereby the 
free energy parameters will get relatively large updates 
at the initial stages. Also one should avoid sampling 
observables during this initial stage. Alternatively one 
may use free energy perturbation or a few Wang-Landau 
iterations to get a reasonable initial estimate of f m . After 
this, the simulation may proceed with an initial prior 
W k = 1. 

It is further recommended to make each iteration 
quite short, consisting of only Nx ~ 100-1000 param- 
eter moves, during this initial stage. (Later on it may be 
increased.) It is also advisable to monitor the histogram 
H m of visited parameter values, although it is not used 
directly to update the free energy. The robustness of 
the algorithm can then be increased by restarting the 
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simulation if the histogram gets too skewed, e.g., if the 
minimum value H min is less than a certain fraction of the 
mean. This could be an indication that initial nonequi- 
librium transients have distorted the distribution of the 
collected samples, which would violate the main assump- 
tion of the algorithm, namely that the samples collected 
during each iteration follow Eq. ([3]). If this happens one 
should reset the weight histogram and the effective num- 
ber of samples (e.g., Wk H m \ a , N <— AlH m i n or per- 
haps even Wk <— 1, N <— M), to allow the simulation to 
recover from that situation. 



B. Combining several simulations 

Often it is advantageous to run simulations in paral- 
lel to make efficient use of computational resources. The 
scheme introduced above can easily be adapted to such 
situations. Each computing node (n) runs an indepen- 
dent simulation (consisting of N n samples) leading to an 

estimate fm of the free energy parameters. These may 
then be combined into a best estimate F rn 



where 2^ = e^* and AT is an unimportant nor- 
malization constant. This equation is easily solved by 
iterating 



r(n) 



(9) 



F <— F 



In 



M£ n JV» ( 



2^irn,n Jv " e 



(10) 



starting from one of the fm (and this usually converges 
within 2-5 iterations). This way of organizing the simu- 
lation also has the advantage that statistical errors can 
be estimated using the standard jackknife method [T3| 
applied to Eq. ([TP]). 



C. Relation to the 1/t method 

Many variations of the basic algorithm are possible, 
and may be related to other methods. For example, it 
reduces to the 1/t method in the limit obtained by the 
following modifications: (1) Replace the Gibbs sampler 
by a simple nearest neighbor Metropolis step. (2) Re- 
place the weight histogram W m by a simple histogram 
H m of visited m. (3) Update the free energy parameters 
after every step. Since the histogram after a visit to m is 
Hk = N/M+5km, the free energy update becomes Afk = 
-\n(H k M/(N+l)) = -ln(l+4mA//iV)+ln(l + l/JV) w 
— SkmM/N + 1/N, where the approximation holds when 
N 3> M. The last term represents a constant shift of all 
fk and can be dropped. The resulting update rule is thus 
simply f m <— f m — M/N, leaving all other unmodified. 
This corresponds exactly to the 1 ft method Q discussed 
earlier, and provides a new perspective on and additional 
justification for that update scheme. 



III. BENCHMARKS OF THE METHOD 

To study the performance of the method and com- 
pare it with other ones we apply it to the Ising model 
and a spin glass. We carry out a simulated tempering 
simulation, i.e., we choose as parameter A the tempera- 
ture. The algorithm alternates between ordinary canon- 
ical Metropolis MC updates in which randomly chosen 
spins are flipped with probability min(l, e _/3A£ ), and up- 
dates which change the temperature, leaving the spin 
configuration and the energy £ unchanged. In the lat- 
ter ones a new temperature T m < = l//3 m < is chosen with 
the probability 



(11) 



A. Two-dimensional Ising model 

The two-dimensional (2D) Ising model is a common 
test case, since its free energy can be calculated ex- 
actly 11]. We choose M — 128 temperatures evenly 
spaced in the interval [1.8,3], which includes the criti- 
cal temperature T c = 2/ln(l + y/2) « 2.27. The system 
size is L — 64 and we use 100 000 iterations, each last- 
ing for 1000 MC sweeps, in total 10 8 sweeps, where each 
MC sweep corresponds to one update trial per spin. A 
temperature move is attempted after each MC sweep. 

During the initial stages we use the scheme discussed 
in Sec. Ill Al to get an initial guess for the f m and a 
prior weight W pi i OT : At the start of the simulation we 
set fk — Pk^Q, where £ = —2L 2 is the ground state 
energy, and Wk — 1/M. Then we check, after each iter- 
ation, whether all M temperatures have been visited at 
least twice during the simulation so far. If not, the effec- 
tive number of samples is reset to N — M'/M, where M' 
is the number of temperatures which actually were vis- 
ited twice. When all temperatures have been visited we 
have a reasonable initial guess of / m , and may continue 
the simulation as described in Sec. [Ill with Wk > 1. Fur- 
thermore, we also monitor the histogram of visited tem- 
peratures to look for anomalous deviations, which would 
indicate that the initial guess was not so good after all. 
Thus, we restart the simulation (i.e., we set N = M, 
Wk = 1 and reset the calculations of any observables, 
but do not touch the fk) should the histogram of vis- 
ited temperatures H m at some point fall below 2% of its 
mean. This happened in about half of the simulation 
runs, typically within the first 50 iterations. 

To benchmark the method we plot, in Fig. [Ha), the 
mean absolute deviation 



M-l 



5F 



M — 



fm ~t~ ^rr 



(12) 



m— 1 



of consecutive free energy differences against the num- 
ber of samples. Here F m — F(T m )/T m is the exact di- 
mensionless free energy. For comparison we also include 
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FIG. 1. (a) Mean absolute deviation of estimated and exact free energy differences 8F of the 64 x 64 Ising model as a function 
of number of samples TV. From top to bottom: results from Wang-Landau iterations, the 1/t method, and the AWH method. 
Also included is the average behavior of the AWH method and a curve showing a 1/yN dependence. Inset: Difference between 
estimated and exact free energy. The error bars (the shaded area) represent one standard deviation, (b) As in (a), but for an 
8x8x8 Ising spin glass. 



results from simulations using Wang-Landau iterations 
(with flatness criteria H m [ n > 0.4i? mcan ) and the 1/t 
method. For large times, the error for both the 1/t and 
our method decrease as 1/yN, whereas it saturates for 
the Wang-Landau method. For a given number of sam- 
ples, the accuracy of the AWH method is almost one or- 
der of magnitude better than the 1/t method. The inset 
shows the difference between the final estimate, obtained 
by combining 40 independent simulations using Eq. (fl"0| , 
and the true free energy over the temperature range. The 
error bars are estimated using the jackknife method. 

Another useful measure of the efficiency is the tunnel- 
ing time, i.e., the time to go from the highest temperature 
to the lowest or vice versa. This time was significantly 
reduced, nearly by a factor of two, from ~ 40 000 MC 
sweeps for the 1/t to ~ 21 000 for the AWH method. It 
should be noted that the dynamics suffer severely from 
critical slowing down in the vicinity of the phase tran- 
sition, which constitutes a bottleneck for the movement 
along the temperature axis. While the extended tem- 
perature ensemble methods are effective for crossing en- 
ergy barriers, they do not overcome this slowing down by 
themselves. In this sense the 2D Ising model (using sin- 
gle spin flip dynamics) is not a particularly favorable test 
case. However, the methods can easily be combined with 
cluster methods, if available, which do overcome the crit- 
ical slowing down. Replacing the single spin flip moves 
by, e.g., Wolff cluster updates [3 for \T m -T c \ < 0.1 (the 
cluster moves being most effective in the critical region) 
in the example above practically eliminates the bottle- 
neck and further reduces the tunneling time by an addi- 



tional factor fa 10 to about 2200, for the AWH method. 
The 1/t method on the other hand only gained a factor 
of two. 

As discussed in Sec. |H] one of the advantages of the 
AWH method is the insensitivity to the spacing of pa- 
rameter values X m . Indeed, varying the number of tem- 
peratures from M = 32, 64, 128 up to 256, had negligible 
effect on the performance of the algorithm, both in terms 
of the accuracy of the final free energy estimate and the 
tunneling time, while the increase in the run time of the 
simulation was marginal (and could be practically elim- 
inated using the update rule 2'). Upon decreasing M 
below 16, on the other hand, the performance quickly 
dropped. 



B. Three-dimensional Ising spin glass 

Next we apply the method to the three-dimensional 
Ising spin glass with Gaussian couplings. This model 
has a disorder-dominated glass phase at low tempera- 
tures T < T g sa 0.95 [l3l |. with a very rough energy land- 
scape, making it extremely challenging to study using 
conventional simulations. The system size is L = 8, and 
we use M = 200 temperatures logarithmically spaced in 
[0.7,3.5]. Figure QJb) compares the convergence of the 
different methods for one particular random realization 
of the couplings. As there is no exact solution to com- 
pare with we use as reference instead the best estimate 
obtained from 80 different runs (with an estimated stan- 
dard error < 0.002). Here, the gain in accuracy, com- 
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pared to the 1/t method, is more than an order of mag- 
nitude. The tunneling time, i.e., the time to go between 
the high- and low-temperature extremes, is also signifi- 
cantly shorter, by nearly a factor of 20. 

IV. SUMMARY AND CONCLUSIONS 

Let us reiterate the advantages of the AWH method: 
Allowing for large steps gives a fast diffusion along the 
parameter axis. As a result, the spacing between neigh- 
boring values in the discretized parameter space is not 
critical as long as it is small enough and does not re- 
quire any fine tuning to perform well. We make efficient 
use of the data collected at all stages of the simulation. 
This is done by reweighting on the fly the samples taken 
at the current parameter value to a whole range of dif- 
ferent parameter values. The information needed for this 
reweighting procedure is essentially the same as what en- 
ables the large steps. The data taken at earlier iterations 
are not thrown away, but are instead used together with 
the new data to refine the estimate of the free energy pa- 
rameters. Since the weights are constant during each it- 



eration, the data collected will, after an initial relaxation, 
be in equilibrium and can be used for the calculation of 
any desired averages. 

Altogether, these properties make up a very convenient 
method for sampling models with rough energy land- 
scapes, and for the calculation of free energy differences. 
It should be emphasized that it is the combination of 
the Gibbs sampler, the reweighting scheme, and the up- 
date rule using the weight histogram, which leads to the 
dramatic improvements. The method is very general, is 
simple to implement, and can be applied to a broad range 
of problems in statistical physics, biophysics, statistics, 
etc. Further improvements are likely, especially when it 
comes to the heuristic scheme used during the early-stage 
bootstrap. 
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